A Test Collection for Email Entity Linking
نویسندگان
چکیده
Most prior work on entity linking has focused on linking name mentions found in third-person communication (e.g., news) to broad-coverage knowledge bases (e.g., Wikipedia). A restricted form of domain-specific entity linking has, however, been tried with email, linking mentions of people to specific email addresses. This paper introduces a new test collection for the task of linking mentions of people, organizations, and locations to Wikipedia. Annotation of 200 randomly selected entities of each type from the Enron email collection indicates that domainspecific knowledge bases are indeed required to get good coverage of people and organizations, but that Wikipedia provides good (93%) coverage for the named mentions of locations in the Enron collection. Furthermore, experiments with an existing entity linking system indicate that the absence of a suitable referent in Wikipedia can easily be recognized by automated systems, with NIL precision (i.e., correct detection of the absence of a suitable referent) above 90% for all three entity types.
منابع مشابه
Knowledge Base Population for Organization Mentions in Email
A prior study found that on average there are 6.3 named mentions of organizations found in email messages from the Enron collection, only about half of which could be linked to known entities in Wikipedia (Gao et al., 2014). That suggests a need for collection-specific approaches to entity linking, similar to those have proven successful for person mentions. This paper describes a process for a...
متن کاملCreating and Curating a Cross-Language Person-Entity Linking Collection
To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages. This paper describes an efficient way to create and curate such a collection, judiciously exploiting existing language resources. Queries are created by semi-automatically identifying person names on the English side of a ...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملBuilding a Cross-Language Entity Linking Collection in Twenty-One Languages
We describe an efficient way to create a test collection for evaluating the accuracy of cross-language entity linking. Queries are created by semiautomatically identifying person names on the English side of a parallel corpus, using judgments obtained through crowdsourcing to identify the entity corresponding to the name, and projecting the English name onto the non-English document using word ...
متن کاملEstimating the Parameters for Linking Unstandardized References with the Matrix Comparator
This paper discusses recent research on methods for estimating configuration parameters for the Matrix Comparator used for linking unstandardized or heterogeneously standardized references. The matrix comparator computes the aggregate similarity between the tokens (words) in a pair of references. The two most critical parameters for the matrix comparator for obtaining the best linking results a...
متن کامل